Multi-source face tracking with audio and visual data
نویسندگان
چکیده
A real-time, face tracker based on both sound and visual cues is presented. Initial talker locations are estimated acoustically from microphone array data while precise localization and tracking are derived from visual data. The image processing employs a hierarchical structure which utilizes source motion, contour geometry, color data, and facial features. The resulting system is capable of tracking multiple people in complex backgrounds and robustly discriminating faces from similar objects. While the direct focus of this work is automated video conferencing, the face tracking capability has utility to many multimedia and virtual reality applications.
منابع مشابه
Speaker Tracking Using an Audio-visual Particle Filter
We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view fa...
متن کاملAn embedded audio-visual tracking and speech purification system on a dual-core processor platform
Design of an embedded audio–visual tracking and speech purification system is described in this paper. The system is able to perform human face tracking, voice activity detection, sound source direction estimation, and speech enhancement in real-time. Estimating the sound source directions helps to initialize the human face tracking module when the target changes the direction. The implementati...
متن کاملAudiovisual-based adaptive speaker identification
An adaptive speaker identification system is presented in this paper, which aims to recognize speakers in feature films by exploiting both audio and visual cues. Specifically, the audio source is first analyzed to identify speakers using a likelihood-based approach. Meanwhile, the visual source is parsed to recognize talking faces using face detection/recognition and mouth tracking techniques. ...
متن کاملAn Audio-Visual Particle Filter for Speaker Tracking on the CLEAR'06 Evaluation Dataset
We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view fa...
متن کاملAudio-visual SPeaker localization for car navigation systems
Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio information can significantly improve dialog system performance, since the visual modality is not impacted by acoustic noise. In this paper, we propose a robust audio-visual integra...
متن کامل